Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.
translated by 谷歌翻译
通过潜在树形图形模型建模高维数据的分布是多个科学域中的一种普遍存在的方法。常见的任务是推断底层树结构,仅给出其终端节点的观察。树恢复的许多算法是计算密集型的,这将其适用于中等大小的树木。对于大树,一种共同的方法,被称为剥夺和征服,是以两步恢复树结构。首先,将结构分别恢复终端节点的多个可能随机子集。其次,合并生成的子树以形成一棵树。在这里,我们开发频谱自上而下的恢复(STDR),确定性分割和征服方法来推断出大潜在树模型。与以前的方法不同,STDR基于与观察到的节点相关的合适的LAPLACIAN矩阵的FIEDLER向量,以非随机方式分配终端节点。我们证明,在某些条件下,这种分区与树结构一致。反过来,这导致了小远子的显着更简单的合并程序。我们证明了STDR在统计上是一致的,并绑定了以高概率准确恢复树所需的样本数量。使用来自近几种常见树模型的模拟数据在系统发育中,我们证明STDR在运行时具有显着的优势,具有改善或类似的准确性。
translated by 谷歌翻译
We introduce anchored radial observations (ARO), a novel shape encoding for learning neural field representation of shapes that is category-agnostic and generalizable amid significant shape variations. The main idea behind our work is to reason about shapes through partial observations from a set of viewpoints, called anchors. We develop a general and unified shape representation by employing a fixed set of anchors, via Fibonacci sampling, and designing a coordinate-based deep neural network to predict the occupancy value of a query point in space. Differently from prior neural implicit models, that use global shape feature, our shape encoder operates on contextual, query-specific features. To predict point occupancy, locally observed shape information from the perspective of the anchors surrounding the input query point are encoded and aggregated through an attention module, before implicit decoding is performed. We demonstrate the quality and generality of our network, coined ARO-Net, on surface reconstruction from sparse point clouds, with tests on novel and unseen object categories, "one-shape" training, and comparisons to state-of-the-art neural and classical methods for reconstruction and tessellation.
translated by 谷歌翻译
Action recognition models have achieved impressive results by incorporating scene-level annotations, such as objects, their relations, 3D structure, and more. However, obtaining annotations of scene structure for videos requires a significant amount of effort to gather and annotate, making these methods expensive to train. In contrast, synthetic datasets generated by graphics engines provide powerful alternatives for generating scene-level annotations across multiple tasks. In this work, we propose an approach to leverage synthetic scene data for improving video understanding. We present a multi-task prompt learning approach for video transformers, where a shared video transformer backbone is enhanced by a small set of specialized parameters for each task. Specifically, we add a set of ``task prompts'', each corresponding to a different task, and let each prompt predict task-related annotations. This design allows the model to capture information shared among synthetic scene tasks as well as information shared between synthetic scene tasks and a real video downstream task throughout the entire network. We refer to this approach as ``Promptonomy'', since the prompts model a task-related structure. We propose the PromptonomyViT model (PViT), a video transformer that incorporates various types of scene-level information from synthetic data using the ``Promptonomy'' approach. PViT shows strong performance improvements on multiple video understanding tasks and datasets.
translated by 谷歌翻译
Understanding the 3D world from 2D images involves more than detection and segmentation of the objects within the scene. It also includes the interpretation of the structure and arrangement of the scene elements. Such understanding is often rooted in recognizing the physical world and its limitations, and in prior knowledge as to how similar typical scenes are arranged. In this research we pose a new challenge for neural network (or other) scene understanding algorithms - can they distinguish between plausible and implausible scenes? Plausibility can be defined both in terms of physical properties and in terms of functional and typical arrangements. Hence, we define plausibility as the probability of encountering a given scene in the real physical world. We build a dataset of synthetic images containing both plausible and implausible scenes, and test the success of various vision models in the task of recognizing and understanding plausibility.
translated by 谷歌翻译
In this paper, we present a method for converting a given scene image into a sketch using different types and multiple levels of abstraction. We distinguish between two types of abstraction. The first considers the fidelity of the sketch, varying its representation from a more precise portrayal of the input to a looser depiction. The second is defined by the visual simplicity of the sketch, moving from a detailed depiction to a sparse sketch. Using an explicit disentanglement into two abstraction axes -- and multiple levels for each one -- provides users additional control over selecting the desired sketch based on their personal goals and preferences. To form a sketch at a given level of fidelity and simplification, we train two MLP networks. The first network learns the desired placement of strokes, while the second network learns to gradually remove strokes from the sketch without harming its recognizability and semantics. Our approach is able to generate sketches of complex scenes including those with complex backgrounds (e.g., natural and urban settings) and subjects (e.g., animals and people) while depicting gradual abstractions of the input scene in terms of fidelity and simplicity.
translated by 谷歌翻译
Recent advances in deep learning techniques and applications have revolutionized artistic creation and manipulation in many domains (text, images, music); however, fonts have not yet been integrated with deep learning architectures in a manner that supports their multi-scale nature. In this work we aim to bridge this gap, proposing a network architecture capable of rasterizing glyphs in multiple sizes, potentially paving the way for easy and accessible creation and manipulation of fonts.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have achieved great successes in many learning tasks performed on graph structures. Nonetheless, to propagate information GNNs rely on a message passing scheme which can become prohibitively expensive when working with industrial-scale graphs. Inspired by the PPRGo model, we propose the CorePPR model, a scalable solution that utilises a learnable convex combination of the approximate personalised PageRank and the CoreRank to diffuse multi-hop neighbourhood information in GNNs. Additionally, we incorporate a dynamic mechanism to select the most influential neighbours for a particular node which reduces training time while preserving the performance of the model. Overall, we demonstrate that CorePPR outperforms PPRGo, particularly on large graphs where selecting the most influential nodes is particularly relevant for scalability. Our code is publicly available at: https://github.com/arielramos97/CorePPR.
translated by 谷歌翻译
在本文中,我们将Wiener-Ito混乱分解扩展到扩散过程的类别,其漂移和扩散系数具有线性生长。通过省略混乱扩展中的正交性,我们能够证明,对于[1,\ infty)$中的$ p \ in [1,\ infty)$的每个$ p $积分功能都可以表示为基础过程的迭代积分的总和。使用此扩展的截断和(可能是随机的)神经网络的截断总和,在机器学习设置中学习了参数,我们证明,每个财务衍生物都可以在$ l^p $ sense中任意地近似。此外,可以以封闭形式计算近似财务导数的对冲策略。
translated by 谷歌翻译